This section details the external IDs for all the samples we discovered when searching the existing targeted probe data and WES data.
The below table is sortable and filterable. You can triple-click on the cells in the table if you want to copy the contents, like if you wanted to copy the link to the file in the Google storage console.
This summary details all the the external IDs of each sample that failed the depth of coverage QC in the targeted probe pipeline. The depth of coverage QC in the targeted probe pipeline requires that the average gene-level or interval-level coverage is >=50x.
Select the plot to display from the dropdown menu. There are plots for targeted samples alone, WES samples alone, and all samples combined. A desaturated version of the CN heatmap for all samples combined is included as well. These CN heatmaps were constructed using log2(Segment_Mean) values from the seg file.
To look at any one sample in more detail, you can look either at the corresponding horizontal CN plot in the next section titled "Copy number horizontal plots" or look at the CN seg file itself (see either the tables below or the TSV available at the link specified in the "Sample information and identifiers" section. The Segment_Mean value in the seg file itself is not transformed.
Note: The copy number heatmaps will display all available samples given your selection from the dropdown menu. If there are no samples available, the heatmap will be a grey box.
Select the copy number plot you would like to display from the dropdown menu. The dropdown menu includes CN plots from both targeted probe (TSCA and TWIST) and WES data. The source of the data will be displayed on the title of the image. You can also refer to the table of all external IDs that maps each external ID to the source of the data (see "Sample information and identifiers").
The dropdown menu also includes a merged version of the horizontal copy number maps. This PNG file contains all the horizontal CN plots for the participant in a single place for ease of quick comparison.
The targeted copy number tables contains all the seg files for each sample that underwent targeted sequencing. The segment mean in the tables below represents the relative copy number. The relative copy number was calculated following this GATK Somatic CNV calling tutorial. The PON used when calculating the CNV information was created using all the normals from the same sequencing batch. This usually ranges from 8-11 normals for a batch of ~40 tumor samples.
Note that while the horizontal CN plots display the relative copy number, the CN heatmaps display the log2(relative copy number). No pseudocount was included because none of the relative copy number values are ever exactly 0.
The WES copy number tables contains all the seg files for each sample that underwent WES sequencing. The segment mean in the tables below represents the relative copy number, as calculated using (TODO:)
Note that while the horizontal CN plots display the relative copy number, the CN heatmaps display the log2(relative copy number). No pseudocount was included because none of the relative copy number values are ever exactly 0.
Below are interactive tables containing select mutation information from the targeted probe data and the WES data. If there were multiple external IDs in either dataset, they have been combined into one table. The external_id column can be used to filter the data so only the mutations for a single external ID is displayed.
Note that this report only includes samples from the targeted data that pass the depth of coverage QC. Samples that did not pass this QC are not included in this report, and their data is not included in the Google bucket. A list of the samples that failed this QC is included earlier in this document (search for "Table: failed QC external IDs").
Also, note that the below tables have been filtered such that the keep column equals True. What this means is that only the variants that passed the filtering steps in the pipeline are included in the tables below. However, the raw mutation TSVs included in the Google bucket contain all the variants regardless of whether keep is True or False if you are interested in that information.
Generally speaking, if you are looking for more detailed information about why a mutation you expected to see was filtered out or if you want to get access to all of the columns available in the mutation TSV rather than the ones selected here, you can download the raw mutation TSV from the Google bucket. The full TSV contains some boolean columns that, when combined, explain the logic behind whether a variant passes our filters.
For targeted WES data, we use the filters built into Mutect1 (GATK v3) and Mutect2 (GATK v4) in addition to additional filtering logic. For example, we rescue known TCGA hotspots and COSMIC mutations even if Mutect1 or Mutect2 would have filtered these out.
The targeted mutation table contains select columns from the MAF files for each sample that underwent targeted sequencing. The PON used was generated using all the normals we had at the time for the targeted sequencing technology. For example, if we had 45 normals from all the TWIST sequencing so far we would have a PON with 45 normals. This differs from the PON used for CNVs.